ggridges::geom_density_ridges

Function of the Week

Overlays several density plots and fills in the area under the curve.
Author

Jack Hindleuy

Published

January 30, 2025

1 ggridges::geom_density_ridges()

In this document, I will introduce the geom_density_ridges() function and show what it’s for.

library(tidyverse)
library(palmerpenguins)
data(penguins)
#install ggridges with install.packages("ggridges")
library(ggridges)

1.1 What is it for?

Imagine you have some sort of continuous data that you can turn into a density plot. This could be blood pressure, height, or weight. Most of these variables are normally distributed when you have a big enough population.

The function geom_density_ridges() allows you to overlay and directly compare density plots for multiple subcategories. This could be gender, species, city, month, or year.

Such a direct comparison allows us to see where distributions overlap between subcategories and where they are distinct. If time is used, they can also show how distributions of continuous variables change or remain the same over time.

1.1.1 Geom_density_ridges() Uses GGplot Structure

ggplot(dataset ) + aes() + geom_density_ridges()

plot_basic <- ggplot(penguins) +
  aes(x = bill_length_mm,
      y = species) +
  geom_density_ridges()

plot_basic
Picking joint bandwidth of 1.08
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

If you want a line along the bottom you can add that by using geom_density_ridges2()

plot_basic_2 <- ggplot(penguins) +
  aes(x = bill_length_mm,
      y = species) +
  geom_density_ridges2()

plot_basic_2
Picking joint bandwidth of 1.08
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

This divides the bill length into separate species of penguins, but we have other subgroups that we are curious about seeing. We can use arguments color and fill to incorporate those subgroups.

penguins %>%
  drop_na(sex) %>% #there were many na values in the sex category
  ggplot() +
  aes(x = bill_length_mm, 
      y = species, 
      color = island, 
      fill = sex) +
  geom_density_ridges()
Picking joint bandwidth of 0.831

This can get a little cluttered, matching the y axis to a color can help simplify it:

myplot <- ggplot(penguins) +
  aes(x = bill_length_mm, 
      y = species, 
      color = island, 
      fill = species) +
  scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) + #I had to manually match the colors because they were mismatched. scale_color_manual() allows that.
  geom_density_ridges()
myplot
Picking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

Other arguments:

alpha() changes the transparency of the fill color.

jittered_points() allows us to see the individual points that form the distribution curve.

myplot <- ggplot(penguins) +
  aes(x = bill_length_mm, 
      y = species, 
      color = island, 
      fill = species) +
  scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
  geom_density_ridges(alpha = 0.5, jittered_points = TRUE)
myplot
Picking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

Finally, if the ridges are overlapping too much, you can change that with the scale() argument. A scale of 1 means the top of one graph will reach the bottom of the next one.

myplot <- ggplot(penguins) +
  aes(x = bill_length_mm, 
      y = species, 
      color = island, 
      fill = species) +
  scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
  geom_density_ridges(alpha = 0.5, jittered_points = TRUE, scale = 0.95)
myplot
Picking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

Sometimes, having more overlap with transparency on can show areas where the distributions meet.

myplot <- ggplot(penguins) +
  aes(x = bill_length_mm, 
      y = species, 
      color = island, 
      fill = species) +
  scale_color_manual(values = c("Biscoe" = "blue", "Dream" = "green", "Torgersen" = "red")) +
  geom_density_ridges(alpha = 0.5, jittered_points = TRUE, scale = 5)
myplot
Picking joint bandwidth of 1.11
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density_ridges()`).

1.2 Is it helpful?

The ggridges package with the density ridges function would be helpful in observational studies that collect a large amount of data on a number of different variables with subgroups.

Additionally, an interesting graph that I saw using this function used it to compare data over time intervals, where the y axis included each month of a year and the x axis showed how temperatures varied for each of those months.

I think it these graphs can be quite informative and helpful.